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A density matrix p may be represented in many different ways as a mixture of pure states, 
p — ^2 .pi\tj)i){ipi\. This paper characterizes the class of probability distributions (pi) that may 
appear in such a decomposition, for a fixed density matrix p. Several illustrative applications of this 
result to quantum mechanics and quantum information theory are given. 
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I. INTRODUCTION 

The density matrix was introduced as a means of 
describing a quantum system when the state of the sys- 
tem is not completely known. In particular, if the state 
of the system is l^) with probability Pi, then the density 
matrix is defined by 



(2.1) 



(i.i) 



For a fixed density matrix it is natural to ask what class 
of ensembles {pi,\ipi}} gives rise to that density matrix? 
This problem was addressed by Scrodinger || , whose re- 
sults have been extended by Jaynes , and by Hughston, 
Jozsa, and Wootters ||. The result of these investiga- 
tions, the classification theorem for ensembles, has been 
of considerable utility in quantum statistical mechanics, 
quantum information theory, quantum computation, and 
quantum error-correction. 

In this paper we use the classification theorem for en- 
sembles to obtain an explicit classification of probability 
distributions (p.^ such that there exist pure states \ipi) 
satisfying p = X^-Pil'^XV'ilj f° r some fixed density ma- 
trix p. This is done in Section |J. Section III illustrates 
the result with several simple applications to quantum 
mechanics and quantum information theory. Section IV 
concludes the paper. 



II. PROBABILITY DISTRIBUTIONS 
CONSISTENT WITH A MIXED STATE 

To state and prove our results we need to introduce 
some notions from the theory of majorization ■ Ma- 
jorization is an area of mathematics concerned with the 
problem of comparing two vectors to determine which 
is more "disordered". Suppose x and y are two d- 
dimensional real vectors. Then we say x is majorized 
by y, written x -< y, if 



for k = 1, . . . , d — 1, with strict equality required when 
k = d. The ^ notation indicates that the vector compo- 
nents are to be ordered into decreasing order. The usual 
interpretation is that x is more "disordered" or "mixed" 
than y. When x and y are probability distributions it 
can be shown that x -< y implies many quantities com- 
monly used as measures of disorder, such as the Shannon 
entropy, are never lower for x than for y. 

There is a close relation between unitary matrices and 
majorization. Any matrix D whose components may be 
written in the form Dy — |z%| 2 for some unitary matrix 
u = (iiij) is said to be unitary- stochastic. The following 
theorem |J connects the unitary-stochastic matrices to 
majorization. 

Theorem 1: Let x and y be d-dimensional vectors. 
Then x -< y if and only if there exists unitary-stochastic 
D such that x — Dy. 

The proof of this theorem |9) is constructive in na- 
ture. That is, given x -< y it is possible to explicitly 
construct a unitary matrix u = (uij) such that x = Dy 
where (-Dy) = ). Indeed, even more is true - 

for the forward implication in Theorem 1 it turns out 
to be sufficient to consider only orthogonal matrices u, 
that is, real matrices satisfying uu T = u T u = I, where 
T is the transpose operation. The corresponding matrix 



Di 



u~a is known as an ortho- stochastic matrix. Note 



that the expression ufj indicates the square of the ijth 
component of the matrix u, not the ijth component of 
u 2 . The Appendix to this paper gives an outline of the 
construction needed for the reverse implication in Theo- 
rem 1, somewhat different to the proof in ||. 

The second result we need is the classification theorem 
for ensembles 

Theorem 2: Let p be a density matrix. Then 
{pi, \ipi)} is an ensemble for p if and only if there exists 
a unitary matrix u — (wy) such that 
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where \ej) are eigenvectors of p normalized so that A P 

(eAe 1 ' 



3\ 3 



are the corresponding eigenvalues. 



In the statement of Theorem 2 it is understood that 
there may be more elements in the ensemble {pi, 
than there are eigenvectors \e 3 ). When this is the case 
one appends extra zero vectors to the list of eigenvectors, 
until the number of elements in the two lists matches. 
Combining Theorem 1 and Theorem 2 in an appropri- 
ate way gives the following classification theorem for the 
class of probability distributions consistent with a given 
density matrix: 

Theorem 3: Suppose p is a density matrix. Let (pi) 
be a probability distribution. Then there exist normal- 
ized quantum states \tpi) such that 



(2.2) QED 

Theorem 3 is the central result of this paper. Many 
elements of the proof are already implicit in the paper of 
Hughston, Jozsa and Wootters |J, however they do not 
explicitly draw the connection with majorization. The 
forward implication has been proved by Uhlmann flio| |, 
who conjectured but did not find an explicit construc- 
tion for the reverse implication. 



III. APPLICATIONS 

The remaining sections of this paper demonstrate sev- 
eral illustrative applications of Theorem 3 to elementary 
quantum mechanics and quantum information theory. 



(2.3) 

A. Uniform ensembles exist for any density matrix 



if and only if (pi) -< A p , where A p is the vector of eigen- 
values of p. 

In the statement of Theorem 3 it is understood that 
if the vector (f>,-) contains more elements than the vector 
A p , then one should append sufficiently many zeros to A p 
that the two vectors be of the same length. 

Proof of Theorem 3: 

Suppose there exists a set of states such that 
p = Y^iPi\tpi)(ipi\- By Theorem 2 equation ( |2.2| ) must 
hold. Multiplying fl2.2|) by its adjoint gives 



Pi 



l *ik u ij^ P jSjk 



which simplifies to 
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(2.4) 



(2.5) 



Setting Dij = \uij\ , we have (pi) = DX P for unitary- 
stochastic D, and by Theorem 1, [pA) ~< \ p . 

Conversely, if {pA) -< A p then by Theorem 1 we can 
find unitary u such that (|2.5| ) is satisfied. Now define 
states \tpi) by Equation (p.2|); since u.ij,pi and \ej) are 
known this equation determines the \ipA) uniquely. By 
Theorem 2 we need only check that these are properly 
normalized pure states to complete t he p roof. Multiply- 
ing the definition of Equation ( |2.2| ), by its adjoint 
gives 



Pi(ipi\ipi 



jk 

E 
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(2.6) 

(2.7) 
(2.8) 



wher e th e last step follows from the choice of u to sat- 
isfy (2.5). It follows that \ipi) is a normalized pure state. 



As our first application of Theorem 3, suppose d is the 
rank of p, and that m > d. Then it is easy to verify that 
(1/m, 1 /m, . . . , 1/ ni) -< A p , and therefore there exist pure 
states \tpi), . . . , \ tpm) such that p is an equal mixture of 
these states with probability 



E 



(3.1) 



Indeed, if we choose m > d where d is the dimension of 
the underlying sp ace , then for any p there exists a set of 
states such that (B.l) holds. A priori it is not at all ob- 
vious that such a set of pure states should exist for any 
density matrix p, however Theorem 3 guarantees that 
this is indeed the case: any density matrix may be re- 
garded as the result of picking uniformly at random from 
some ensemble of pure states. 



B. Schur-convex functions of ensemble probabilities 

A second application of Theorem 3 relates functions 
of the eigenvalues of p to functions of the probabili- 
ties (pi)- The theory of isotone functions |6j is con- 
cerned with functions which preserve the majorization 
order. More specifically, the Schur-convex functions 
are real-valued functions / such that x -< y implies 
fix) < f(y). Examples of Schur-convex functions include 
f( x ) = il l Xi\og{x i ), f{x) ee ( for an Y constant 

k > 1), f(x) ee — Y\i x i' an d f( x ) = — x \- More examples 
and a characterization of the Schur-convex functions may 
be found in QJ^] . Each such Schur-convex function gives 
rise to an inequal ity relating the vector of probabilities 
(pt) in Equation ( |2.3| ) to the vector A p . For example, we 
see from the Schur-convexity of ^\ Xi log(:Tj) the useful 
inequality that H(p.i) > S(p), where H(-) is the Shannon 
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entropy, and S(-) is the von Neumann entropy. (This re- 
sult was obtained by Lanford and Robinson using dif- 
ferent techniques.) In general, any Schur-convex function 
will give rise to a similar inequality relating (pi) and A p . 
A similar property related to convex functions has previ- 
ously been noted (see the review [[L2| for an overview, as 
well as the original references Jl0| , [l3| ~fl6||), however those 
results are a special case of the more general result 
given here based upon Schur-convex functions. The ear- 
lier results may be obtained by noting that if f(x) is 
convex then the map (pj) — > ^ f(Pi) is Schur-convex. 

C. Representation of bipartite pure states 



Thus \ip) and \<fi) are both purifications of p. It can easily 
be shown ||] that there exists a unitary matrix U acting 
on system A such that U\cf)) — \ip). Defining \i' A ) = U\ia) 
we see that 

as claimed. 
QED 



D. Communication cost of entanglement 
transformation 



A third application of Theorem 3 gives us insight into 
the properties of pure states of bipartite systems. We 
state the result formally as follows: 

Corollary 4: Suppose is a pure state of a com- 
posite system AB with Schmidt decomposition JlTj 



E 



U> IB) 



(3.2) 



Then given a probability distribution (qi) there exists an 
orthonormal basis \i' A ) for system A and corresponding 
pure states l^i) of system B such that 



E 



(3.3) 



if and only if (%) -<! (pt). 

In the statement of Corollary 4 it is understood that if 
(qi) contains more terms than (p^ then the former vec- 
tor should be extended by adding extra zeros. In the 
case where the number of terms in (qi) exceeds the num- 
ber of dimensions of ^4's Hilbert space, ^4's Hilbert space 
must be extended so its dimension matches the number 
of terms in (qi). 

Proof of Corollary 4: 

To prove the forward implication, note that trac- 
ing out system A in equations (3.2) and (3.3) gives 
Y,iPi\i>B){iB\ = Sjftl^iKV'il; and thus by Theorem 3, 
(qi) -n (pi). Conver sely , suppose has Schmidt decom- 
position given by (3.2), and that (qi) -< (pi). Let p be 



the reduced density matrix of system B when A is traced 
out, 



p = tr A (\ip)(tp\) = ^Pi\iB)(iB\- 



(3.4) 



Corollary 4 can be used to give insight into a recent 
result in the study of entanglement transformation [ p"8| . 
Suppose Alice and Bob are in possession of an entangled 
pure state \ip). They wish to transform this state into an- 
other pure state \<fi), with the restriction that they may 
only use local operations on their respective systems, to- 
gether with a possibly unlimited amount of classical com- 
munication. It was shown in 18 that the transformation 
can be made if and only if A^ -< A^, where \^ denotes 
the vector of eigenvalues of the reduced density matrix of 
Alice's system when the joint Alice-Bob system is in the 
state and A^ is defined similarly for the state \(f>). 

To see how Corollary 4 applies in this context, suppose 
\tp) and \4>) are bipartite states with Schmidt decompo- 
sitions 



i 

i$ = E^>i i >' 



(3.7) 
(3.8) 



where without loss of generality we may assume the two 
states have the same Schmidt bases, since local unitary 
transformations can be used to inter-convert between dif- 
ferent Schmidt bases. Note that A^, = (pi) and A^ = (qi). 
Suppose that A^, = (pi) -< A^ = (qi). By Corollary 4, and 
ignoring unimportant local unitary transformations, it is 
possible to write \ip) and \<p) in the form 



i 



(3.9) 
(3.10) 



By Theorem 3, p = li^i) {ip^ for some set of pure 
states \t^i). The state |</>) defined by 



= EV*K^)lV'i 



(3.5) 



is a purification of p, that is, a pure state of system AB 
such that when system A is traced out, tr^d^) (<^>|) = p. 



for some set of pure states \ipi). This form makes it quite 
plausible that the state l^) can be transformed into the 
state \4>) by local operations and classical communica- 
tion: all that needs to be done is for Bob to transform \i) 
into \ipi) in such a way as to preserve coherence between 
different terms in the sum. 

I have not found a general method utilizing this fact 
to transform into \<j)). However, it will now be shown 



3 



how Corollary 4 can be applied successfully in the spe- 
cial case where \ip) is a maximally entangled state of a 
d dimensional system with a d! > d dimensional system, 



E 



\i)\i) 
Vd ' 



(3.11) 



The new proof has the feature that it is exponentially 
more efficient from the point of view of classical commu- 
nication than the protocol described in |l8). The argu- 
ment runs as follows. By Corollary 4 we can find pure 
states \4>i) such that 



E 



mi) 

Vd 



(3.12) 



up to local unitary transformations. Define an operator 
on Bob's system, 



(3.13) 



Ideally, we'd apply F to the system B taking \ip) directly 
to \<f>). This doesn't work because F isn't unitary. In- 
stead, we use F to define a quantum measurement with 
essentially the same effect. Define 



E = 



Vtr(FtF) 



(3.14) 



Let |0), . . . , |d— 1) be the Schmidt basis for Bob's system. 
Define operators X and Z by 



X\j) = \j®l); Z\j)=uP\j), 



(3.15) 



where denotes addition modulo d, and w is a dth root 
of unity. Define unitary operators U s ,t by 



U„. t = X s z l 



(3.16) 



The indices s and t are integers in the range to d — 1 . 
By checking on an operator basis and applying linearity 
it is easily verified that for any Hcrmitian A, 



ul t AU s ,t = tv{A)I. 



Therefore, defining E St t = EU s ,t gives 



(3.17) 



(3.18) 



Bob sends the measurement result to Alice, which re- 
quires [2 log 2 d~\ bits of communication, and then Alice 
performs X s Z~ t (where X and Z are now defined with 
respect to Alice's Schmidt basis) on her system, giving 
the state 



E 



\i®s)\<t> it 
Vd 



(3.20) 



which is just \4>). 

This protocol for entanglement transformation requires 
only [2 log 2 (d)~\ bits of communication, compared with 
the protocol in fis}| , which required d — 1. Another 
method |is| for achieving this result is as follows: Al- 
ice prepares locally a system A'B' in a copy of \(f>). She 
then uses the shared maximal entanglement with Bob 
to teleport |2(J system B' to Bob, creating the desired 
state \4>). Again, this protocol requires |~21og 2 (d)] bits of 
communication. 

The present approach is interesting, in that it does not 
require knowledge of the teleportation protocol in order 
to succeed. Moreover, the method used strongly suggests 
that it may be possible to always perform the transfor- 
mation using 0(log 2 d) bits of communication, even when 
\tj}) is not maximally entangled, a result that does not ap- 
pear obvious from the teleportation protocol. A method 
for doing so has recently been found using different meth- 
ods, and will be reported elsewhere. 



IV. CONCLUSION 



The results reported here answer a fundamental ques- 
tion about the nature of the density matrix as a rep- 
resentation for ensembles of pure states, and give some 
elementary applications of this result to quantum me- 
chanics and quantum information theory. I expect that 
the connection revealed here between majorization and 
ensembles of pure states will be of considerable use in fu- 
ture investigations of fundamental properties of quantum 
systems. 
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Vd 



(3.19) 
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APPENDIX A: UNITARY-STOCHASTIC 
MATRICES AND MAJORIZATION 

In this appendix we outline the constructive steps in 
the proof of Theorem 1. To begin, we first take a slight 
detour connecting majorization with a class of matrices 
known as 1 "-transforms. 

By definition, a T-transform is a matrix which acts as 
the identity on all but 2 dimensions, where it has the 
form: 



T 



t 1-t 
1-t t 



(Al) 



for some parameter t, < t < 1. The following result 
connects majorization and T-transforms 0: 

Theorem 5: If x -< y there exists a finite set of T- 
transforms T\, T2, . . . , T n such that x — T\Ti . . . T n y. 

The converse of Theorem 5 is also true 0, but will 
not be needed. For convenience we provide details of the 
construction of the sequence 7\ , . . . , T n here. 

Proof of Theorem 5: 

The result is proved by induction on d, the dimen- 
sion of the vector space x and y live in. For notational 
convenience we assume that the components of x and y 
have been ordered into decreasing order; if this is not 
the case then one can easily reduce to this case by in- 
sertion of appropriate transposition matrices (which are 
T-transforms). The result is clear when d = 2, so let's 
assume the result is true for arbitrary d, and try to prove 
it for d + 1-dimensional x and y. 

Choose k such that yk < x\ < yk-i- Such a k is guar- 
anteed to exist because x -< y implies that x\ < y\ and 
xi > Xd+i > y<i+i- Choose t such that 



x\ = tyi + (1 - t)y k . 



(A2) 



Now define z to be the result of applying a T-transform 
T with parameter t to the 1st and kth components of y, 
so that 



Ty 

(xi,y'), 



where 



y 



{y 2 ,...,yh-i,(l-t)yi +ty k ,y k+1 



(A3) 
(A4) 



,Vd+i)- (A5) 



Define x' = (X2, X3, . . . , Xd+i). It is not difficult to verify 
that x' -< y' (see (?]] for details), and thus by the in- 
ductive hypothesis, x' = T± . . . T r y' for some sequence of 
T-transforms in d dimensions. But these T-transforms 
can equally well be regarded as T-transforms on d + 1 di- 
mensions by acting as the identity on the first dimension, 
and thus x — T± . . . T r Ty, that is, x can be obtained from 
y by a finite sequence of T-transforms, as we set out to 
show. 
QED 



Note that the inductive step of the proof of Theorem 5 
can immediately be converted into an iterative procedure 
for constructing the matrices T\, . . . , T n , and also implies 
that n = d — 1 in a d-dimensional space. The proof of 
Theorem 1, which we now give, is also inductive in na- 
ture, and is easily converted into an iterative procedure 
for constructing an orthogonal matrix u — (uy ) such 



that D defined by Di 



satisfies Theorem 1. Note 



again the convention that expressions like it?- represent 
the square of the real number mj , not the ijth. component 
of the matrix u 2 . 

To prove Theorem 1 we use the decomposition x = 
T1T2 . . . T n y from the proof of Theorem 5. The strategy 
is to use induction on n to prove that TiT 2 . . . T n = (W? ) 
for some orthogonal matrix W. Suppose n — 1. Omitting 
components on which T\ acts as the identity, we have 



Ti 



t 1-t 
1-t t 



(A6) 



for some t, < t < 1. Define a unitary matrix U to act 
as the identity on all components on which T\ acts as the 
identity, and as 



U = 



(A7) 



on the components where T\ acts non-trivially. It is clear 
that T\ = (Ufj), as required. 

To do the inductive step, suppose that products of n T- 
transforms of the form used in the proof of Theorem 5 are 
ortho-stochastic, and consider the product T1T2 ■ ■ ■ T n+ \. 
We assume T n+ 2-k acts on components k and component 
dk > k, as per the proof of Theorem 5. Let P be the per- 
mutation matrix which transposes components 2 and d\ . 
(The following proof is more transparent if one assumes 
that di = 2, and drops all reference to P, which is a tech- 
nical device to make certain equations more compact.) 
Then 



PT n+1 P 



t 1-t 
1-t t 

I d - 2 



(A8) 



where Id-2 is the d— 2 by d — 2 identity matrix. Further- 
more, let us define ad — lbyd — 1 matrix A by 



TiT 2 ...T n = 



1 
A 



By the inductive hypothesis there is a d — 1 



orthogonal matrix f/y such that Aj 



(A9) 

byd-1 
Define a 



new matrix U' by interchanging the role of the first and 
(di - l)th co-ordinates in U, U' = P'UP', where P' 
transposes the first and (d\ — l)th co-ordinates, and sim- 
ilarly define A' by A' = P'AP'. Then A^ = U' 2 . Also 
we have 



PT X T 2 . . . T n P 





A' 



(A10) 



5 



Multi plyi ng the previous equation by PT n+ \P gives, 
from (A8) and the identity P 2 = I, 



PT 1 T 2 ...T n+1 P 



t 1 -t 
(l-t)6 tS A, 



(All) 



where 5 is the first column of A', and A is the d — 2 by 
d — 1 matrix that results when the first column of A' is 
removed. Let U denote the d — 2 by d — 1 matrix that 
results when the first column of U' is removed, and let u 
denote the first column of U' . Define a d by d matrix V 
by 



V 



t 

u 



(A12) 



We claim that V is an orthogonal matrix. To see this we 
need to show that the columns of V are of unit length 
and orthogonal. The length of the first column is 



y/t+(l-t)u-u = VT = 1. 



(A13) 



A similar calculation shows that the second column is of 
unit length. The remaining columns are all of unit length 
since they are all columns of the unitary matrix U'. Sim- 
ple algebra along similar lines can be used to check that 
the correct orthogonality relations between columns of V 
are satisfied. Observe that PT X T 2 . . . T n+1 P = (V?), so 
if we define W = PVP, we see that W is an orthogonal 
matrix such that T\T% . . . T n+ i = (W^), which completes 
the induction. 
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